home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Turnbull China Bikeride
/
Turnbull China Bikeride - Disc 2.iso
/
STUTTGART
/
TEMP
/
GNU
/
bison
/
SemanticTo
< prev
next >
Wrap
Text File
|
1995-06-28
|
3KB
|
84 lines
Semantic Tokens
Previous: <Context Dependency=>ContextDep> * Next: <Lexical Tie-ins=>LexicalTie> * Up: <Context Dependency=>ContextDep>
#Wrap on
{fH3}Semantic Info in Token Types{f}
The C language has a context dependency: the way an identifier is used
depends on what its current meaning is. For example, consider this:
#Wrap off
#fCode
foo (x);
#f
#Wrap on
This looks like a function call statement, but if {fCode}foo{f} is a typedef
name, then this is actually a declaration of {fCode}x{f}. How can a Bison
parser for C decide how to parse this input?
The method used in GNU C is to have two different token types,
{fCode}IDENTIFIER{f} and {fCode}TYPENAME{f}. When {fCode}yylex{f} finds an
identifier, it looks up the current declaration of the identifier in order
to decide which token type to return: {fCode}TYPENAME{f} if the identifier is
declared as a typedef, {fCode}IDENTIFIER{f} otherwise.
The grammar rules can then express the context dependency by the choice of
token type to recognize. {fCode}IDENTIFIER{f} is accepted as an expression,
but {fCode}TYPENAME{f} is not. {fCode}TYPENAME{f} can start a declaration, but
{fCode}IDENTIFIER{f} cannot. In contexts where the meaning of the identifier
is {fEmphasis}not{f} significant, such as in declarations that can shadow a
typedef name, either {fCode}TYPENAME{f} or {fCode}IDENTIFIER{f} is
accepted---there is one rule for each of the two token types.
This technique is simple to use if the decision of which kinds of
identifiers to allow is made at a place close to where the identifier is
parsed. But in C this is not always so: C allows a declaration to
redeclare a typedef name provided an explicit type has been specified
earlier:
#Wrap off
#fCode
typedef int foo, bar, lose;
static foo (bar); \/\* redeclare {fCode}bar{f} as static variable \*\/
static int foo (lose); \/\* redeclare {fCode}foo{f} as function \*\/
#f
#Wrap on
Unfortunately, the name being declared is separated from the declaration
construct itself by a complicated syntactic structure---the ``declarator''.
As a result, the part of Bison parser for C needs to be duplicated, with
all the nonterminal names changed: once for parsing a declaration in which
a typedef name can be redefined, and once for parsing a declaration in
which that can't be done. Here is a part of the duplication, with actions
omitted for brevity:
#Wrap off
#fCode
initdcl:
declarator maybeasm '='
init
| declarator maybeasm
;
notype\_initdcl:
notype\_declarator maybeasm '='
init
| notype\_declarator maybeasm
;
#f
#Wrap on
Here {fCode}initdcl{f} can redeclare a typedef name, but {fCode}notype\_initdcl{f}
cannot. The distinction between {fCode}declarator{f} and
{fCode}notype\_declarator{f} is the same sort of thing.
There is some similarity between this technique and a lexical tie-in
(described next), in that information which alters the lexical analysis is
changed during parsing by other parts of the program. The difference is
here the information is global, and is used for other purposes in the
program. A true lexical tie-in has a special-purpose flag controlled by
the syntactic context.